Segmentation of a multi-column document

ABSTRACT

A method for detecting a logical structure of an image of a document. The method includes identifying objects in the image of the document, constructing a graph dividing the identified objects in the image of the document, detecting an optimal path through the graph to locate regions in the image of the document, and dividing the image of the document into the regions based at least in part on the detected optimal path.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority under 35 USC 119 toRussian Patent Application No. 2014101125, filed Jan. 15, 2014; thedisclosure of which is incorporated herein by reference.

FIELD OF TECHNOLOGY

The subject matter of the present application relates to a method andsystem in the field of image processing, specifically the analysis ofdocuments with a complex spatial layout.

BACKGROUND

Recognition of the internal content of photographed or scanned documentsis a challenge at present. The basic principle of existing segmentationmethods is to search for a document's basic (logical) structure such ascolumn regions, incuts, and headers, and to analyze the identifiedregions inside (for example, identification of text lines in a columnregions). Known segmentation methods are often able to identify onlyrectangular blocks in a document (e.g. rectangular columns of text andincuts with primitive shapes). Such methods need the breaks betweencolumns and incuts to be sufficiently large to correctly determinewhether a given fragment of the document belongs to a column or anincut. Incorrect classification occurs when there are isolated objects(for example, sparse table cells) inside columns or incuts and when theshape of regions is not rectangle.

Existing methods of document analysis are unable to accurately andreliably perform segmentation of document images. Therefore, there is aneed to develop a new method to analyze the internal content ofdocuments.

SUMMARY

In one aspect, the present disclosure is related to a method fordetecting a basic (logical) structure of an image of a document. Themethod includes identifying objects in the image of the document,constructing an area diagram for the identified objects, constructing adivision graph based on the area diagram for the identified objects, andconstructing an adjacency diagram based on the division graph. In someimplementations, constructing the adjacency graph includes assigninggraph vertices corresponding to the identified objects, identifyingpairs of adjacent objects corresponding to pairs of objects divided byat least one edge in the division graph, and joining each pair ofadjacent objects by an edge of the adjacency graph. The method furtherincludes identifying graph vertices and graph edges on the area diagramand creating a minimal graph based on the identified graph vertices andgraph edges. The method further includes detecting an optimal paththrough the graph to locate regions in the image of the document, anddividing the image of the document into the regions based at least inpart on the detected optimal path.

In some implementations, the method further includes assigning weightand/or penalties values to edges of the division graph. In animplementation, the weight values are based upon a comparison of atleast two identified objects. The method further includes summing theassigned weight values to determine a total weight value to detect theoptimal path. In an illustrative embodiment, the optimal path is a pathwith the best total weight value. The method further includesconstructing a sub-graph of the division graph to restrict a search areafor the needed path, identifying connected components in the subgraph,constructing at least one path relative to the identified connectedcomponents, and filtering the paths. In some implementations, the methodfurther includes correcting the constructed path by adding one or morepatches. The method further includes removing at least one graph edgefrom the adjacency graph corresponding to the optimal path, identifyingconnected components in the adjacency graph, constructing regionsrelative to the connected components in the adjacency graph, anddividing the image of the document in the constructed regions.

In another aspect, the present disclosure is related to a system todetect a basic structure of a document. The system includes a memoryconfigured to store processor-executable instructions and a processoroperatively coupled to the memory. In some implementations, theprocessor is further configured to identify objects in the image of thedocument, construct an area diagram for the identified objects,construct a division graph based on the area diagram, and construct anadjacency diagram based on the division graph. In some implementations,the processor is further configured to assign graph verticescorresponding to the identified objects, identify pairs of adjacentobjects corresponding to pairs of objects divided by at least one edgein the division graph, and join each pair of adjacent objects by an edgeof the adjacency graph. The processor is further configured to identifygraph vertices and graph edges on the area diagram, and create a minimalgraph based on the identified graph vertices and graph edges. The methodfurther includes detecting an optimal path through the graph to locateregions in the image of the document, and dividing the image of thedocument into the regions based at least in part on the detected optimalpath.

In some implementations, the processor is further configured to assignweight or penalties values to edges of the division graph. In animplementation, the weight values are based upon a comparison of atleast one pair of identified objects. The processor is furtherconfigured to sum the assigned weight values to determine a total weightvalue to detect the optimal path. In some implementations, the optimalpath is a path with the best total weight value. The processor isfurther configured to construct a sub-graph of the division graph torestrict a search area for the needed path, identify connectedcomponents in the sub-graph, construct at least one path relative to theidentified connected components, and filter the paths. In someimplementations, the processor is further configured to correct theconstructed path by adding one or more patches. The processor is furtherconfigured to remove at least one graph edge from the adjacency graphcorresponding to the optimal path, identify connected components in theadjacency graph, construct regions relative to the connected componentsin the adjacency graph, and divide the image of the document in theconstructed regions.

In another aspect, the present disclosure is related to a non-transitorycomputer-readable storage medium having computer-readable instructionsstored therein, the instructions being executable by a processor of acomputing system. The instructions further include instructions toidentify objects in the image of the document, construct an area diagramfor the identified objects, instructions to construct a division graphbased on the area diagram, and instructions to construct an adjacencydiagram based on the division graph. In some implementations, theinstructions further include instructions to assign graph verticescorresponding to the identified objects, instructions to identify pairsof adjacent objects corresponding to pairs of objects divided by atleast one edge in the division graph, and instructions to join each pairof adjacent objects by an edge of the adjacency graph. The instructionsfurther include instructions to identify graph vertices and graph edgeson the area diagram and instructions to create a minimal graph based onthe identified graph vertices and graph edges. The method furtherincludes detecting an optimal path through the graph to locate regionsin the image of the document, and dividing the image of the documentinto the regions based at least in part on the detected optimal path.

In some implementations, the instructions further include instructionsto assign weight or penalties values to edges of the division graph. Inan implementation, the penalties are based upon a comparison of mutualcharacteristics of at least one pair of identified objects. Theinstructions further include instructions to sum the assigned weightvalues to determine a total weight value to detect the optimal path. Insome implementations, the optimal path is a path with the best totalweight value. The instructions further include instructions to constructa sub-graph of the division graph to restrict a search area for theneeded path, instructions to identify connected components in thesub-graph, instructions to construct at least one path relative to theidentified connected components, and instructions to filter the paths.In some implementations, the instructions further include instructionsto correct the constructed path by adding one or more patches. Theinstructions further include instructions to remove a at least one graphedge from the adjacency graph corresponding to the at least one optimalpath, instructions to identify connected components in the adjacencygraph, instructions to construct regions relative to the connectedcomponents in the adjacency graph, and instructions to divide the imageof the document in the constructed regions.

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects, features, and advantages ofthe disclosure will become more apparent and better understood byreferring to the following description taken in conjunction with theaccompanying drawings, in which:

FIG. 1 is flow diagram of a method for detecting a logical structure ofa document in accordance with an illustrative embodiment;

FIG. 2 is a flow diagram of a method for constructing an area diagram inaccordance with an illustrative embodiment;

FIG. 3 is a flow diagram of a method for constructing a division graphin accordance with an illustrative embodiment;

FIG. 4 is flow diagram of a method for detecting an optimal path inaccordance with an illustrative embodiment;

FIG. 5 is a flow diagram of a method for dividing a document intoregions in accordance with an illustrative embodiment;

FIG. 6 illustrates an example of an image of a document with distortedcolumns in accordance with an illustrative embodiment;

FIG. 7 illustrates an example of an image of a document with incuts inaccordance with an illustrative embodiment;

FIG. 8 illustrates examples of objects in an image of a document inaccordance with an illustrative embodiment;

FIG. 9 illustrates a first example of a graph of objects in an image ofa document in accordance with an illustrative embodiment;

FIG. 10 illustrates a second example of a graph of objects in an imageof a document in accordance with an illustrative embodiment; and

FIG. 11 is a block diagram of a system for detecting a logical structureof a document according to an illustrative embodiment.

DETAILED DESCRIPTION

In the following description, for purposes of explanation, numerousspecific details are set forth in order to provide a thoroughunderstanding of the invention. It will be apparent, however, to oneskilled in the art that the invention can be practiced without thesespecific details. In other instances, structures and devices are shownonly in block diagram form in order to avoid obscuring the invention.

Reference in this specification to “one implementation” or “animplementation” means that a particular feature, structure, orcharacteristic described in connection with the implementation isincluded in at least one implementation of the invention. Theappearances of the phrase “in one implementation” in various places inthe specification are not necessarily all referring to the sameimplementation, nor are separate or alternative implementations mutuallyexclusive of other implementations. Moreover, various features aredescribed which may be exhibited by some implementations and not byothers. Similarly, various requirements are described which may berequirements for some implementations but not other implementations.

Segmentation is an important step in document image recognition. Thesegmentation process makes it possible to identify different regions ondocument images. Regions may be columns, pictures, tables, text blocks,headers, etc.

The present disclosure is directed to methods and systems to discover(or identify) the basic structure of a document. In someimplementations, the document may contain text columns, images, tables,etc. The present disclosure is also directed to the discovery(identification) of the borders of objects of any complexity located ina document, e.g. borders of objects in the document may be partially orcompletely cut into columns. In some implementations, the discovery ofthe borders includes the discovery of gaps between columns, for examplein an image of a document. The gaps in images of documents may becurved, narrow, or distorted. In some implementations, the objects in adocument may be arbitrarily shaped, which increases the difficulty inidentifying borders. An example of a document with curved columns 62 isshown in FIG. 6. A document may include various regions such as: textblocks, pictures, columns, tables, diagrams, etc. The methods andsystems described herein may divide the document into these regions toproperly understand their logical relationship.

There may be two types of regions in a document: column regions andnon-column regions. Column regions may be understood to be sections ofthe document containing text that is shaped like a column. A documentmay have one or more columns of text. Non-column regions, in someimplementations, may be sections of a document containing pictures,framed text, headers, or tables and may be referred to hereinafter as“incuts.”

An incut may be region on the document that contains text, a framedtext, a header, a table, and/or a picture. In some implementations, themain feature of an incut is that it partially or completely cuts into acolumn of the main text. Hereinafter, an incut that partially cuts intoa column may be referred to as “partial.” Additionally, hereinafter, anincut that completely cuts into a column may be referred to as“complete.” Note the main text of a column may be located either to theleft or to the right of a partial incut. Thus, there may be two types ofpartial incuts: a right partial incut and a left partial incut.

FIG. 7 illustrates an example of an image of a document with incuts. InFIG. 7, an image of a document includes an incut (71), a first column(72), a middle column (73), a third column (74), a complete incut (75)and a border of the middle column (76). The complex incut (71) in FIG. 7is an incut for each column. For the first column (72), the incut (71)is a partial right (rectangular) incut. For the middle column (73), theincut (71) is a complete incut. For the third column (74), the incut(71) is a partial left (non-rectangular) incut.

In some implementations, incuts may be located at the top, bottom, ormiddle of a text column. For an upper incut, the main text may belocated below the incut. Thus, the incut is adjacent to the upper partof the context. For a lower incut, the main text may be located abovethe incut, and the incut may be adjacent to the lower part of thecontext. In the case of a middle incut, the text may be located above,below, and/or to one side (to the right or left), depending on whetherthe incut is a right incut or a left incut. For example, the“International Business” incut (75) in FIG. 7 is an upper (complete)incut for all of the columns. In some implementations, middle completeincuts can split a column into two disconnected parts. In oneimplementation, headers may be considered as a special case of uppercomplete incuts.

FIG. 1 shows a flow diagram of a method for detecting a logicalstructure of a document. The method may be implemented on a computingdevice (e.g., a user device). In one implementation, the method isencoded on a computer readable medium that contains instructions that,when executed by the computing device, cause the computing device toperform operations of the method. At block 100, an image of a documentis received. In some implementations, the image of the document (orframe) may be obtained electronically using any one of the knownmethods. In an implementation, the image may be obtained from the memoryof an electronic device, or from any other accessible sources. The imageof the document may be received by a processor. The processor may beconfigured to receive the image of the document and perform the methodsdescribed herein.

At block 105, objects in the image of the document are identified. Insome implementations, in order to find the borders between regions,objects can first be identified in the document (105). Objects in adocument can, for example, be fragments of text lines, lines, picturefragments, pictures, words, separators, etc. FIG. 8 illustrates examplesof text objects (82) in an image of a document. FIG. 8 depicts fragmentsof text lines, which may be objects (82). Object borders (84) aredenoted with a black line.

In the present disclosure the image of the document is segmented intoregions using an area Voronoi diagram (hereinafter “AVD”), and theidentified objects are used as the Voronoi areas. Referring back to FIG.1, at block 110, an AVD is constructed. An area Voronoi diagram may beconstructed by any convenient method. For the purposes of segmentation,an approximate area Voronoi diagram may be constructed using, forexample, the method shown in FIG. 2.

FIG. 2 shows a flow diagram of a method for constructing an approximatearea Voronoi diagram. At block 200, an image with identified objects isreceived. At block 210, each object is initially approximated with a setof points (210). In some implementations, the set of points may bereferred to as approximating points. FIG. 8 shows examples of objects(82) and one of the ways to approximate them with a set of points (86).After approximating points have been found for all objects, at block220, a Voronoi diagram is built for them (220). The set of points in thebuilt Voronoi diagram, which are located closer to a given approximatingpoint than any of the others may be referred to as a Voronoi cell of theapproximating point. At block 230, the borders between cells belongingto the same objects (230) are eliminated. At block 240, the resultingdiagram is an approximate area Voronoi diagram. It is worth noting thatthe AVD contains useful supplemental information related to the topologyof objects. For example, it may be used to refine the borders betweenobjects. In some implementations, this utilizes a special data structurereferred to as a division graph.

FIG. 3 shows a flow diagram of a method of implementation forconstructing a division graph (120, at FIG. 1). At block 300, an areaVoronoi diagram is received. In some implementations, the edges in adivision graph signify continuous sections of borders between Voronoicells, for example Voronoi cells in the area Voronoi diagram. The edgesin the division graph may begin and end where the corresponding borderlines are branching. A branch may be a point where three or more Voronoicells meet. In some implementations, an edge supplies additionalinformation, for example, the edge may signify a pair of objects dividedby the corresponding border line.

FIG. 9 schematically illustrates a first example of a division graph.FIG. 9 shows the edge Eab (91), which divides adjacent cells A (92) andB (93). The vertices in the division graph can be the branch points(102) and the terminal points (104) on the AVD, as schematicallyillustrated in FIG. 10, which illustrates a second example of a divisiongraph. In some implementations, the vertices corresponding terminalpoints may be referred to as terminal vertices. Terminal vertices can bevertices located along the border of the graph. In an implementation,only one edge can extend from a terminal vertex. In someimplementations, if an object has a more or less random shape, thenexactly 3 edges will originate from each non-terminal vertex. Thisfollows from the properties of a Voronoi diagram.

Referring back to FIG. 3, at blocks 310 and 320, the branch points andterminal points on the AVD can be used when constructing the divisiongraph. In some implementations, each of these points corresponds to agraph vertex (310), and each segment that divides the Voronoi cellscorresponds to a graph edge (320). At block 330, the resulting graph maybe reduced to a minimal homeomorphic graph. In some implementations, toreduce the resulting graph to a minimal homeomorphic graph, verticeswith only two incident edges may be eliminated and the two correspondingedges may be joined into one (330). In one implementation, edges arejoined only if they divide the same pair of objects. The resulting graphis a division graph 340.

Now referring back to FIG. 1, at block 120, an adjacency graph isconstructed. In some implementations, an adjacency graph of objects maybe a dual representation of the AVD. Unlike the division graph, thevertices of the adjacency graph may have a well-defined physical meaningand correspond to the objects in the image of the document themselves.The edges of the adjacency graph connect adjacent objects, while theedges of the division graph divide them. In some implementations, adivision graph may be used to construct an adjacency graph. In oneembodiment, the procedure to construct an adjacency graph using adivision graph may include identifying the vertices of the adjacencygraph. In some implementations, the vertices of the adjacency graph maycorrespond to the objects identified in the document image, and twovertices of the adjacency graph may be joined by an edge if the objectsthat correspond to these vertices are divided by at least one edge inthe division graph (the supplementary information of an edge of thedivision graph indicates the objects that it divides). In someimplementations, it is not possible to unambiguously reconstruct adivision graph from an adjacency graph of objects. This is due to thefact that it may not be possible to know from the adjacency graphwhether or not two edges in the division graph must be adjacent (i.e.share a common vertex).

Still referring to FIG. 1, at block 140, a path search is conducted. Thepath search can be conducted after the division graph and the adjacencygraph have been constructed, where the path search is a search for pathsin the division graph.

FIG. 4 shows a flow diagram of a method for detecting an optimal path.In some implementations, regions in a document may be text blocks,pictures, columns, tables, diagrams, etc. Most frequently, the bordersbetween regions in documents are rather large areas of whitespace, forexample, the separation between columns or the borders of incuts, whichare generally continuous. In terms of the division graph, such a spacemay be a rather long continuous path (i.e. a sequence of adjacent edges)in the division graph.

In an implementation, a path in the division graph can be a sequence ofedges in which the end of one edge (a vertex) is the beginning ofanother edge. In some implementations, a path in the division graph mayterminate either with a terminal vertex or with a vertex from some otherpath. FIG. 7 illustrates an example of an image of a document, and it isreadily apparent that a set of paths (76) will “cut” the document,thereby splitting its context into the desired regions.

In an implementation, to correctly split the document into regions, anoptimal path is defined. To do this, referring back to FIG. 4, at block400, the edges of the division graph are weighted. In someimplementations, the edges of the division graph are weighted based onan analysis of the consistency of the objects that the edges of thedivision graph pass between. The analysis may also include a comparisonof characteristics of the identified objects in the image of thedocument. For example and without limitation, the analysis may include acomparison of an object's text quality, the position of baselines,height, etc. In some implementations, based on the results, an edgeseparating the objects being examined may be assigned a weight or apenalty. In one implementation, the penalty (weight) may be a certainnumeric value. In some implementations, the weight values may be basedupon analyzing mutual and/or similar characteristics of at least onepair of identified objects. For example, if two objects are very similarto pieces of a line, then an edge that separates these two objects maybe assigned a large penalty. In other implementations, if two objects'types are not consistent, (i.e. a text object and a picture), then theedges between these two objects may be assigned a smaller penalty.

In some implementations, the size of the penalty may depend on thedistance between objects. In the general case, for various subtasks thepenalties may differ. For example, in one implementation, when searchingfor specific types of incuts, one may expect that there is column texton at least on side of an edge, so that edges that separate non-textualobjects may have a large penalty. In another implementation, a largepenalty would not be appropriate when searching for paths betweencolumns (inter-column paths), because it may be possible that an edgeseparates two pictures, each of which belongs to its own column.

In some implementations, finding the paths in the division graph assumesthat a path must pass along a border between neighboring objects in thedocument. Because the distance between region borders is greater than,for example, the distance between text lines in a single column, thenafter the edges of the division graph have been weighted, it turns outthat the edges dividing neighboring regions have sufficiently smallweight (penalty) relative to the edges dividing objects within a singleregion. Thus, in an implementation, to find a path that passes along theborder between regions, it is desirable to find the path with thesmallest penalty. In some implementations, the path with the smallestpenalty may be an optimal path. In an implementation, the optimal pathmay be the path with the best total weight value. The best total weightvalue may have the smallest total penalty or highest total score value.

In an implementation, the process of constructing the optimal path maybegin with a search for “good” edges, or edges with terminal vertices. A“good” edge may refer to an edge with the smallest penalty. In oneimplementation, a “good” edge may refer to an edge with a penalty belowa pre-determined threshold. Initially, a path has no penalties, becauseit includes no edges. With the addition of each edge, the path's penaltyincreases by the magnitude of the added edge's penalty. In someimplementations, Dijkstra's algorithm may be used for example to findthe path with the smallest penalty. Thus, in an implementation, the pathobtained by adding the edges with the smallest penalty may be the borderbetween two or more regions of an image of a document. In someimplementations, the path obtained by adding the edges with the smallestpenalty may be between terminal vertices and/or sections of other pathsin the document image. In general, any scoring system may be selected tosum, add, total, and/or quantify a score value for an edge. For example,in one implementation, instead of penalties, the quality of an edgeand/or path could be scored. Thus, one may construct the path by addingedges with the highest weight (score) of quality. The method describednext uses a penalty system as an example.

In some implementations, a subgraph of the division graph may becreated. A subgraph may be a graph that contains some subset of thevertices of the division graph and some subset of the edges incident tothem in the division graph. In some implementations, a subgraph may bedefined in an area of document image through which the desired path issurmised to pass. In such implementations, the path search will takeplace only within the subgraph. One of the advantages of using subgraphsis that the process of discovering regions in a document can beaccelerated.

The method presented makes it possible to segment documents of arbitrarycomplexity and with the most diverse logical structure. Constructingpaths based on a multi-column page with incuts, similar to the documentillustrated in FIG. 7, can be difficult. This sort of structure istypical of magazines and newspapers and is one of the most complexsegmentation tasks. An example of constructing inter-column paths andpaths to isolate incuts is described below.

In some implementations, path construction may include several stages.First one or more inter-column graphs (i.e. sub-graphs) may beconstructed. In some implementations, the inter-column graphs areconstructed using the division graph. To construct the inter-columnsub-graph, various hypotheses as to how to partition the page intocolumns may be tested. In doing so, it is recognized that columns may beuniform or varied in width. In some implementations, the inter-columngraph may be searched for edges with small penalties, for example,penalties that are less than a certain threshold value. In animplementation, edges with small penalties may be most suitable to serveas borders between two columns. In some implementations, the edgesidentified with small penalties may be a section of the future optimalpath.

In an implementation, to optimize the search for the separating path, anundirected graph may be turned into a directed graph. Initially, thesub-graph may be undirected, for example, the sub-graph's edges may nothave a direction. In order to produce a directed graph from anundirected graph, each undirected edge can be replaced with two directededges, where the two directed edges are pointed in opposite directions.In some implementations, a search is conducted, for example, for pathsthat move only from top to bottom. In one implementation, edges thatmove in the opposite direction may be eliminated. If no reliablesections of inter-column paths are found for a given column-partitioninghypothesis (i.e. path sections having edges with small penalties),another way of partitioning the document into columns may be considered.

In some implementations, in addition to inter-column paths, a search forcomplete and partial incuts may be conducted. In an implementation,partial incuts may be located in places where there are breaks ininter-column paths. Vertices, where an inter-column path is broken, maybe treated as initial vertices of the path of partial incuts.

In places where no partial incuts are found, there is a chance offinding the path of complete incuts. In some implementations, to findthe path of a complete incut, it may be desirable to examine edgeslocated between the borders of the break in the inter-column paths. Forexample, it is possible to find edges whose weights are above theaverage weight (or whose penalties are below the average penalty) ofedges within the column being analyzed (because the distance between theincut and the main column text is greater than the distance between thetext lines inside the column).

In some implementations, to construct paths in the division graph, themethod may include look through the initial vertices and the endvertices (i.e., where the path ends) of the already constructed paths,trying to start a new path from one of these vertices. In otherimplementations, “good” edges may be identified and used as a section toconstruct a new path from them by successively adding edges that areadjacent to them.

Now referring to FIG. 4, at block 440, an inter-column graph isconstructed. In some implementations, as has already been stated, atentatively constructed inter-column graph (440) may be used whenconstructing inter-column paths. Inter-column graph may include onlythose edges that may be elements of an inter-column path. In animplementations, “horizontal edges” (i.e., edges that dividehorizontally overlapping objects) may be eliminated from the graph.Additionally, in some implementations, edges that do not fall on theborders of the analyzed columns may be eliminated.

At block 450, the inter-column graph may be searched for all connectedcomponents. A connected component may refer to sets of vertices of thegraph, such that for any two vertices in this set there is a path fromone to the other, and there is not a path from a vertex in the set to avertex that is not in the set. Next, at block 460, the inter-columnpaths may be compared to the connected components found. In someimplementations, the inter-column paths may be compared to the connectedcomponents using Dijkstra's algorithm.

At block 470, all or part of the paths may be filtered. In someimplementations, paths may be filtered based on both absolute and/orrelative characteristics. In an implementation, when using absolutecharacteristics, short, curved paths and/or paths with large penaltiesmay be considered suspect. The short, curved paths, paths with largepenalties may be subject to filtering. In some implementations, relativefiltering accounts for the fact that the given interval may containseveral paths that are good but incompatible with one another. Forexample, in one implementation, if a table is located under aninter-column path, it may contain a good long vertical path which ishorizontally incompatible with the inter-column path. In someimplementations, determination of which of two paths is spurious may befor example possible based on comparative analysis that considers whichpath is closer horizontally to the center of the inter-column division.

Still referring to FIG. 4, at block 410, a tentatively constructedcolumn sub-graph, which includes only those edges that can separate anincut from a column with the text, may be used during the constructionof paths of partial incuts. In some implementations, the creation ofsuch a sub-graph may account for the fact that at least one objectadjacent to an edge of searched path will most likely be a text object(text of a column).

The process of constructing paths of partial incuts (420) seeks to finda path between broken sections of found inter-column path. The process(420) may successively add edges located between the vertices of brokeninter-column path to construct a path which connects the vertices. In animplementation, if an upper/lower incut is searched, the process seeksto find path from an initial/end vertex of an appropriate inter-columnpath to one of terminal vertices in the column sub-graph.

Firstly, the path of partial incut is searched for using Dijkstra'salgorithm. In some implementations, this path suggested by Dijkstra'salgorithm may be referred to as a base path. After the base path (420)has been found, at block 430, the base path may be corrected. In animplementation, correcting the base path may include expanding the basepath (430). In some implementations, the base path may need to becorrected because the incut may be sparse, for example it may consist ofobjects located far from one another or it may be a compound incut(e.g., a picture and its caption). If there is a sparse incut, the basepath may be incorrectly detected, i.e. the path could erroneously passthrough the incut.

To combat this phenomenon, it is proposed to use a patch technique. Insome implementations, patches may be used to correctly adjust theconstructed path, for example, in the case of sparse incuts. A patch maybe an edge or section of a path with a small penalty that is a candidatefor inclusion in the path. In some implementations, an example of anindication that an edge or a several edges should be added as a patchis, for example, the fact that the edge/edges divide objects which areessentially different (e.g. a part of picture and a text object or textobjects with different height).

After patches have been constructed, an attempt may be made to add thepatches to the incut's base path. The base path with one or more patchadded may be referred to as a final path. In some implementations, thesystem may add several patches to find the final path. In animplementation, a set of hypotheses of patch paths can be created, andthe best among them can be selected. This selection may be based on twocriteria. First, the selection may be based on a concept that it is bestfor the incut to encompass as much “large territory” as possible.Second, the selection may be based on penalties of patches, because thefinal path should be “good” so it should contain edges with smallpenalties.

Overall, the method of creating paths of complete incuts may be similarto the method of constructing paths of partial incuts. In someimplementations, incut candidates may be searched based on anenumeration of the end/initial vertices of the incut's path. In animplementation, the final candidate may be selected heuristically basedon the following criteria: a) distance from a location of a break ininter-column paths (the smaller the distance, the more preferable thecandidate path); b) the quality of the path itself (the lower thelikelihood that the candidate path “cuts” through a paragraph ofcolumnar text, for example, the better).

In some implementations, the identified text columns can be checked forthe presence of incuts that do not protrude from the column borders. Todo this, in an implementation, the content of the column may be analyzedfor the presence of sections of separating paths with small penalties.If a column of text contains a “good” section of a path, then in someimplementations, the column may contain a partial incut. In animplementation, an attempt to construct the path of a partial incut canbe made based on this section.

FIG. 5 shows a flow diagram of a method for dividing a document intoregions. The constructed system of paths is not enough to understand thelogical structure of a document and to obtain a document segmentationresult, so additional stages should be performed to identify regionswithin the document. In some implementations, after the system of pathsin the division graph has been found, objects can be assigned to theregions to which they correspond. In an implementation, the adjacencygraph (120) may be used for this purpose. The edges which arecorresponding to the constructed system of paths are removed from theadjacency graph. Each edge of the division graph stores informationabout the objects that it divides (i.e. the objects located on bothsides of the edge). For each edge of paths (480), information regardingthe objects that the edge divides may be extracted, and thecorresponding edge from the adjacency graph, which joins these objects,may be removed (510). Next, at block 520, any known method may be usedto identify connected components in obtained adjacency graph withremoved corresponding edges (520). The connected components correspondto the desired regions and contain precisely those objects that belongto these regions. At block 530, the resulting regions are identified onthe document image (530) and displayed in the segmentation results.Finally, the image of the document can be divided into regions (160).

The foregoing description of illustrative embodiments has been presentedfor purposes of illustration of description. It is not intended to beexhaustive or limiting with respect to the precise form disclosed, andmodifications and variations are possible in light of the aboveteachings or may be acquired from practice of the disclosed embodiments.It is intended that the scope of the invention be defined by the claimsappended hereto and their equivalents.

FIG. 11 shows a system 1100 that may detect a logical structure of adocument using the techniques described above, in accordance with someembodiments of the disclosure. The system 1100 typically includes atleast one processor 1102 coupled to a memory 1104. The processor 1102may represent one or more processors (e.g., microprocessors), and thememory 1104 may represent random access memory (RAM) devices comprisinga main storage of the system 1100 and/or any supplemental levels ofmemory e.g., cache memories, non-volatile or back-up memories (e.g.programmable or flash memories), read-only memories, etc. In addition,the memory 1104 may include memory storage physically located elsewherein the system 1100 (e.g. any cache memory in the processor 1102) as wellas any storage capacity used as a virtual memory (e.g., as stored on amass storage device 1110).

In some implementations, the system 1100 receives a number of inputs andoutputs for communicating information externally. The system 1100 mayinclude one or more user input devices 1006 (e.g., a keyboard, a mouse,a scanner etc.) and a display 1108 (e.g., a Liquid Crystal Display (LCD)panel) for interfacing with a user/operator. For additional storage, thehardware 1100 may also include one or more mass storage devices 1110,e.g., a floppy or other removable disk drive, a hard disk drive, aDirect Access Storage Device (DASD), an optical drive (e.g. a CompactDisk (CD) drive, a Digital Versatile Disk (DVD) drive, etc.) and/or atape drive, among others. Furthermore, the system 1100 may include aninterface with one or more networks 1112 (e.g., a local area network(LAN), a wide area network (WAN), a wireless network, and/or theInternet among others) to permit the communication of information withother computers coupled to the networks. It should be appreciated thatthe system 1100 typically includes suitable analog and/or digitalinterfaces between the processor 1102 and each of the components 1104,1106, 1108 and 1112 as is well known in the art.

The system 1100 operates under the control of an operating system 1114,and executes various computer software applications, components,programs, objects, modules, etc. indicated collectively by referencenumber 1116 to perform the correction techniques described above.

In general, the routines executed to implement the embodiments of thedisclosure may be used as part of an operating system or a specificapplication, component, program, object, module or sequence ofinstructions referred to as “computer programs.” The computer programstypically comprise one or more instructions set at various times invarious memory and storage devices in a computer, and that, when readand executed by one or more processors in a computer, cause the computerto perform operations necessary to execute elements involving thevarious aspects of the disclosure. Moreover, while the invention hasbeen described in the context of fully functioning computers andcomputer systems, those skilled in the art will appreciate that thevarious embodiments of the disclosure are capable of being distributedas a program product in a variety of forms, and that the disclosureapplies equally regardless of the particular type of machine orcomputer-readable media used to actually affect the distribution.Examples of computer-readable media include but are not limited torecordable type media such as volatile and non-volatile memory devices,floppy and other removable disks, hard disk drives, optical disks (e.g.,Compact Disk Read-Only Memory (CD ROMS), Digital Versatile Disks,(DVDs), etc.), among others, and transmission type media such as digitaland analog communication links. Computer-readable media, as includedwithin the present disclosure, include only non-transitory media (i.e.,do not include transitory signals-in-space).

Although the present disclosure has been provided with reference tospecific exemplary embodiments, it is evident that the variousmodifications can be made to these embodiments without changing theinitial spirit of the invention. Accordingly, the specifications anddrawings are to be regarded in an illustrative sense rather than in arestrictive sense.

What is claimed:
 1. A method for detecting a logical structure of animage of a document, the method comprising: identifying objects in theimage of the document; constructing a division graph of the identifiedobjects in the image of the document; detecting at least one optimalpath through the division graph to locate regions in the image of thedocument; and dividing the image of the document into the regions basedat least in part on the detected optimal path.
 2. The method of claim 1,wherein constructing the division graph is performed based onconstructing an area diagram for the identified objects.
 3. The methodof claim 1, further comprising constructing an adjacency graph based onthe division graph, wherein constructing the adjacency graph comprises:assigning graph vertices corresponding to the identified objects;identifying pairs of adjacent objects corresponding to pairs of objectsdivided by at least one edge in the division graph; and joining eachpair of adjacent objects by an edge of the adjacency graph.
 4. Themethod of claim 2, wherein constructing the division graph furthercomprises: identifying graph vertices and graph edges on the areadiagram; and creating a minimal graph relative to the identified graphvertices and graph edges.
 5. The method of claim 1, wherein detectingthe optimal path further comprises assigning weight values to edges ofthe division graph, wherein the weight values are based upon analyzingmutual characteristics of at least one pair of identified objects. 6.The method of claim 5, further comprising summing the assigned weightvalues to determine a total weight value for paths in the divisiongraph, wherein the at least one optimal path is a path with the besttotal weight value.
 7. The method of claim 1, wherein detecting the atleast one optimal path comprises: constructing at least one sub-graph ofthe division graph; identifying connected components in the sub-graph;constructing at least one path relative to the identified connectedcomponents; and filtering the at least one path.
 8. The method of claim1, wherein detecting the optimal path further comprises correcting theat least one optimal path by adding one or more patches.
 9. The methodof claim 3, wherein dividing the image of the document into regionsfurther comprises: removing at least one graph edge from the adjacencygraph corresponding to the at least one optimal path; identifyingconnected components in the adjacency graph; constructing regionsrelative to the connected components in the adjacency graph; anddividing the image of the document in the constructed regions.
 10. Asystem to detect a logical structure of an image of a document, thesystem comprising: a memory configured to store processor-executableinstructions; and a processor operatively coupled to the memory, whereinthe processor is configured to: identify objects in the image of thedocument; construct a division graph of the identified objects in theimage of the document; detect at least one optimal path through thedivision graph to locate regions in the image of the document; anddivide the image of the document into the regions based at least in parton the detected optimal path.
 11. The system of claim 10, wherein theprocessor is further configured to construct an area diagram for theidentified objects.
 12. The system of claim 10, wherein the processor isfurther configured to: construct an adjacency graph based on thedivision graph, wherein, to construct the adjacency graph, the processoris further configured to: assign graph vertices corresponding to theidentified objects; identify pairs of adjacent objects corresponding topairs of objects divided by at least one edge in the division graph; andjoin each pair of the adjacent objects by an edge of the adjacencygraph.
 13. The system of claim 11, wherein the processor is furtherconfigured to identify graph vertices and graph edges on the areadiagram.
 14. The system of claim 10, wherein the processor is furtherconfigured to assign weight values to edges of the division graph,wherein the weight values are based upon analyzing mutualcharacteristics of at least one pair of identified objects.
 15. Thesystem of claim 14, wherein the processor is further configured to sumthe assigned weight values to determine a total weight value for pathsin the division graph, wherein the at least one optimal path is a pathwith the best total weight value.
 16. The system of claim 10, whereinthe processor is further configured to: construct at least one sub-graphof the division graph; identify connected components in the sub-graph;construct at least one path relative to the identified connectedcomponents; and filter the at least one path.
 17. The system of claim10, wherein the processor is further configured to correct the at leastone optimal path by adding one or more patches.
 18. The system of claim12, wherein the processor is further configured to: remove at least onegraph edge from the adjacency graph corresponding to the at least oneoptimal path; identify connected components in the adjacency graph;construct regions relative to the connected components in the adjacencygraph; and divide the image of the document in the constructed regions.19. A non-transitory computer-readable storage medium havingcomputer-readable instructions stored therein, the instructions beingexecutable by a processor of a computing system, wherein theinstructions comprise: instructions to identify objects in the image ofthe document; instructions to construct a division graph of theidentified objects in the image of the document; instructions to detectan optimal path through the division graph to locate regions in theimage of the document; and instructions to divide the image of thedocument into the regions based at least in part on the detected optimalpath.
 20. The non-transitory computer-readable storage medium of claim19, further comprising: instructions to construct an area diagram forthe identified objects.
 21. The non-transitory computer-readable storagemedium of claim 19, further comprising: instructions to construct anadjacency graph based on the division graph, wherein constructing theadjacency graph further comprises: instructions to assign graph verticescorresponding to the identified objects; instructions to identify pairsof adjacent objects corresponding to pairs of objects divided by atleast one edge in the division graph; and instructions to join each pairof adjacent objects by an edge of the adjacency graph.
 22. Thenon-transitory computer-readable storage medium of claim 20, furthercomprising: instructions to identify graph vertices and graph edges onthe area diagram; and instructions to create a minimal graph based onthe identified graph vertices and graph edges.
 23. The non-transitorycomputer-readable storage medium of claim 19, further comprisinginstructions to assign weight values to edges of the division graph,wherein the weight values are based upon analyzing mutualcharacteristics of at least one pair of identified objects.
 24. Thenon-transitory computer-readable storage medium of claim 23, furthercomprising instructions to sum the assigned weight values to determine atotal weight value for paths in the division graph, wherein the at leastone optimal path is a path with the best total weight value.
 25. Thenon-transitory computer-readable storage medium of claim 19, furthercomprising: instructions to construct at least one sub-graph of thedivision graph; instructions to identify connected components in thesub-graph; instructions to construct at least one path relative to theidentified connected components; and instructions to filter the at leastone path.
 26. The non-transitory computer-readable storage medium ofclaim 19, further comprising: instructions to correct the at least oneoptimal path by adding one or more patches.
 27. The non-transitorycomputer-readable storage medium of claim 21, further comprising:instructions to remove at least one graph edge from the adjacency graphcorresponding to the at least one optimal path; instructions to identifyconnected components in the adjacency graph; instructions to constructregions relative to the connected components in the adjacency graph; andinstructions to divide the image of the document in the constructedregions.